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(54) A method of searching documents and a service for searching documents 



(57) This invention aims to provide an efficient 
means of performing a document search wherein rele- 
vance between plural document databases is exam- 
ined. A summary making module (132) and a search 
module (133) are provided to document databases 
(131), and these are connected to a network (12) as a 
server (1 3). A client (1 1) obtains a relevant set of docu- 
ments in a specified document database via this sum- 
mary from a set of documents in the specified document 
database. The summary obtained is sent to another 
server (14), and a search is performed according to the 
summary in a document database (141) in the server 
(14) to which the summary is transferred. 
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Description 

Background of the Invention 

[0001 ] This invention relates to a document search- 
ing method for changing over between plural document 
databases, and constructing relationships between plu- 
ral document databases. 

[0002] As more and more document information is 
converted to electronic format, a greater need is emerg- 
ing to search different types of document database 
simultaneously. For instance, users often wish to look 
up dictionaries relating to newspaper articles which they 
may find of interest. 

[0003] In the past, it was possible to perform a 
search independently by changing over between plural 
document databases, but there was no way of examin- 
ing the relevancy of sets of documents in other data- 
bases to a set of documents in one particular database. 
[0004] If however the search is limited to the same 
document database, it is possible to search other docu- 
ment sets within that database. In this case, sufficient 
search speed is often obtained by calculating the rele- 
vance between documents before searching. Even with 
different databases, it is possible to search plural docu- 
ment databases at the same time if such a calculation is 
performed beforehand, but since the need for calcula- 
tion increases as the number of databases increases 
due to increasing numbers of combinations, this method 
is not realistic. 

[0005] It is also possible to first analyze the set of 
key documents on the user side to compose a search 
input, and then search in other document databases by 
using the input, but in this case, the user side has to 
receive alt the information about the set of key docu- 
ments, and if the document databases are on a net- 
work, the amount of traffic would be huge. 

Summary of the Invention 

[0006] It is therefore an object of this invention to 
resolve the problems inherent in existing technology by 
allowing a user to specify an arbitrary set of documents 
in an arbitrary document database, and to efficiently 
search sets of documents relating to this set of docu- 
ments from within any particular database. 
[0007] When there is a large search input as in the 
case of a set of key documents, instead of using all the 
information in the search input, it is faster to perform a 
search using only topic words of the search input as a 
summary, and this also reduces the load on the net- 
work. In the context of this specification, "summary" 
means a "set of topic words for a set of documents". 
[0008] The document databases are located on 
servers on a network comprising a module for building a 
summary by selecting topic words for a set of docu- 
ments within the document database, and a module for 
performing a search on any arbitrary summary. 



[0009] A user who performs a search specifies a set 
of documents via a client to a server in which an source 
document database is stored, and receives a summary. 
[001 0] Next, the summary is sent to a server where 
5 a target document database to be searched is located, 
and a search result is received. 
[0011] As the search interface of the client, a dis- 
play area for a set of documents is first provided 
wherein the required set of key documents can be spec- 
ie ified, and the database to be searched can also be 
selected. In the client, the user then selects an interest- 
ing set of documents from among a set of documents 
displayed in this display area, and if necessary, changes 
over the document database which is to be searched. 
is [0012] These and other objects, features and 
advantages of the present invention will become more 
apparent in view of the following detailed description of 
the preferred embodiments in conjunction with the 
accompanying drawings. 

20 

Brief Description of the Drawings 
[0013] 

25 Fig. 1 is a diagram showing an example of the over- 
all construction of a system implementing the plural 
document database search method. 
Fig. 2 is a diagram showing an example of the con- 
struction of a search assistant interface in a client. 

30 

Description of the Preferred Embodiments 

[0014] Fig. 1 shows a typical general arrangement 
wherein a client 1 1 specifies an arbitrary set of key doc- 

35 uments in a document database 131 of a server 13, and 
obtains a set of documents having a high relevance 
(similarity) to the specified set of key documents from a 
document database 141 of another server 14. Herein, 
the source and target document databases 131, 141 

40 are located on servers in different places which can be 
respectively accessed via a network 12. 
[001 5] First, the client 1 1 specifies a set of key doc- 
uments in the source document database 131 accord- 
ing to user's specification, and sends this information to 

45 the server 1 3 as a set comprising a document identifier, 
for example an ID attached to each document which the 
server 13 can understand, via the network 12. The set 
of documents is specified in a window for displaying 
search results P1 described later. 

so [0016] The server 13 identifies a set of documents 
which were sent from the client. A summary of the set of 
documents is then made for the searched set of docu- 
ments by a summary making module 132, and this is 
sent back to the client 1 1 via the network 12. Herein, the 

55 term "summary" means a set of topic words relating to a 
set of documents. The summary making module can be 
constructed by any of the known methods such as that 
disclosed in Japanese Patent Laid-Open No. Hei 9- 
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62693, "Method of Document Classification by Probabil- 
ity Model". 

[0017] As an example, word frequencies are first 
totaled by splitting up all the documents in the set of 
documents for which it is desired to make a summary, 
into words. In general, as the degree to which a given 
set of documents is represented by particular words is 
higher for words which appear more frequently within it, 
words will tend to be included in the summary more fre- 
quently the higher their occurrence frequency is in the 
set of documents. However, general words which often 
appear in all documents such as "do", etc., are not suit- 
able for the summary. Therefore, words are usually 
selected for inclusion in the summary by considering 
also their appearance frequency in the document data- 
base to which the set of documents belongs. Specifi- 
cally, desirable words are topic words having a high 
occurrence frequency in a specified set of documents 
but a low overall frequency in the document database 
overall, i.e., they are suitable for a summary character- 
izing the set of documents. Hence, words are selected 
for the summary by calculating their weighting from suit- 
able parameters using the occurrence frequency in the 
set of documents and the occurrence frequency in the 
document database as input, and adopting words hav- 
ing a weighting equal to or greater than a certain thresh- 
old. 

[0018] The higher the weighting, the higher the rel- 
evance of the word to a given document, and the lower 
the weighting, the lower the relevance of the word to the 
document. 

[0019] The server 13 returns a set of words having 
a weighting calculated by the above-mentioned method 
to the client via the network 12. These words are dis- 
played as "topic words" in Fig. 2. 
[0020] Next, at the client 1 1 , users evaluate or proc- 
ess the summary (summary of the set of key docu- 
ments) from the server 13, and the client 1 1 transmits it 
to the target server 14 via the network 12. 
[0021] In the evaluation or the processing per- 
formed by users at the client, users for example remove 
words which are not deemed to be relevant from the 
summary, or users for example replace words in the 
summary. 

[0022] Using the search module 1 43, the server 1 4 
calculates the relevance of the summary of the set of 
key documents sent from the client to the target docu- 
ment database 141 , and returns document identifiers of 
high relevance to the client 1 1 with a relevance weight- 
ing. The search module here can be implemented by a 
keyword search known in the art. Specifically, as the 
summary of the set of documents which is input is a set 
of words with weightings, these words may be consid- 
ered as weighted input keywords and an OR keyword 
search performed. In this case, the weighting (rele- 
vance) of the document which is the search result can 
be calculated. This is done by taking words which 
appear both in the summary and the document to be 
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searched, calculating an overall weighting from their 
weighting in the summary and their weighting in the 
document to be searched (e.g., product of the two 
weightings), and then adding up the weightings of all the 
5 words (e.g., calculating a sum total) to obtain the rele- 
vance. 

[0023] Using the above method, the client 1 1 can 
obtain a set of documents in the document database 
141 which relates to an arbitrary set of key documents 

w in the document database 131 . Hie characteristic fea- 
ture of this method is that network traffic is reduced to a 
small amount by leaving the processing (summary mak- 
ing) of the original set of documents searched to the 
server side. It will be appreciated that the amount of traf- 

15 fic is much less than in the case where the client has to 
receive and process all of the document information to 
be searched. The search assistant module 112 of the 
client then basically has only to send the summary of 
the set of documents from the source server to the tar- 

20 get server, and almost all of the processing involved in 
the search can be left to both servers. Moreover, the 
server side merely has to have a summary making mod- 
ule and a search module for the document database in 
question, and it is therefore completely unnecessary to 

25 consider information in other document databases. 
[0024] In the aforesaid description, a method was 
described wherein the document database 131 was the 
source database and the document database 141 was 
the target database, but the same method can be 

30 adopted wherein the document database 141 is the 
source database and the document database 1 31 is the 
target database. In this case, the client obtains a sum- 
mary of the set of key documents from a summary mak- 
ing module 142 of the server 14, transmits it to the 

35 server 1 3 which is to be searched, and obtains relevant 
documents in the document database 131 from the 
search module 133 of the server 13. If the above is gen- 
eralized, and a server with a summary making module 
and a search module is provided for a new document 

40 database, this document database can be made to 
function as the source database or target database lor 
all document databases connected to the network sim- 
ply by connecting the server to the network. 
[0025] In Fig. 1, the summary making module and 

45 search module (132, 133, and 142, 143) were respec- 
tively located in different servers (13,14), but this 
embodiment of the invention is not limited to this partic- 
ular arrangement. For example, the summary making 
module and search module may be installed in a differ- 

50 ent server from the document database, and a sum- 
mary making module and search module provided by 
this server for plural databases. 
[0026] Finally, Fig. 2 shows an embodiment con- 
cerning the client. 1 1 1 is an example of a search assist- 

55 ant interface installed in the client. This is basically the 
same as the interface proposed by the inventor of the 
present application in Japanese Patent Laid-Open No. 
Hei 11-85786, "Document search support method and 
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document search support service" (corresponding to 
U.S. Patent Application S.N.09/145,155, filed 09/01/98 
by Nishioka et al), or Japanese Patent Laid-Open Hei 
10-74210, "Document search support method and 
device, and document search service using same", 
(corresponding to U.S. Patent Application S.N.08- 
888 t 017,filed 07/03/97 by Niwa et al). E1 is a window for 
inputting a search query, wherein the user can input a 
search query by a string of keywords or in the form of a 
sentence. M1 is a window for selecting a document 
database wherein the user can pull down a specific part 
on the right edge with a mouse to show a list of docu- 
ment databases, and select a desired document data- 
base. B1 is a search button which initiates a search. 
Therefore, the user inputs an arbitrary search query in 
the window E1, selects a document database to be 
searched in the window M1, and performs an ordinary 
search by keywords input to the window E1 concerning 
the document database selected in the window M1 , by 
pressing the button B1. This search is performed with 
the support of the search assistant module 112 shown 
in Fig. 1, but as the details of the search method were 
given in the previous application, they will not be 
repeated here. 

[0027] P1 is a window for displaying a search result. 
In the upper part, a panel showing the total number of 
documents retrieved as a result of the search process 
and a number of documents selected by the user as 
described hereafter, is displayed. Underneath this, a 
panel is provided for the user to specify selected/not 
selected(P13), and a document title part showing the 
relevance(P12) to the search query and titles(P11) of 
documents displayed in the form of a list. This display 
window has a scroll function so that, by scrolling, the 
user can see a part which cannot be displayed in the 
display at one time. In the selected/hot selected panel, 
documents are either selected or deselected each time 
there is a mouse click. When documents are selected 
by clicking, a summary of the corresponding documents 
is displayed as a graphical representation of a set of 
words with weightings in a summary display window P2. 
The summary display window P2 also has a panel in its 
upper part where the total number of topic words and 
the number of topic words selected by the user are dis- 
played. Document titles are usually sorted in order of 
relevance. 

[0028] The window P1 for displaying the search 
result in the diagram shows that a total of 22 documents 
were retrieved as a result of the search, and that three 
documents were selected by the user as interesting 
documents judging from their titles. The selected docu- 
ments are given a check mark by clicking. In the sum- 
mary display window P2, five topic words are 
accordingly displayed corresponding to the selected 
documents. 

[0029] Although omitted from this embodiment, 
conversely, documents for which the topic words 
selected in the summary display window P2 are repre- 



sentative, can be displayed in the window P1. There- 
fore, the user can perform a more advanced search by 
making a summary customized according to his prefer- 
ence. This is explained in detail in the aforesaid refer- 

5 ence Japanese patent laid open Hei 1 1 -085786. 

[0030] Hence, the user can select/deselect docu- 
ments while referring to the titles and the topic words of 
selected documents, and can select plural documents 
in which he is interested. 

10 [0031] Subsequently, if the user is interested in 
handling other document data for the set of documents 
corresponding to this search result, he may change the 
document database in the window M1, and press the 
button B1 to begin a new search. 

is [0032] Hence, the client sends an identifier of the 
plural documents selected to the server where the 
source document database is stored (for example, the 
server 13), obtains a summary of these plural docu- 
ments, sends this summary to the server where the tar- 

20 get document database is stored (for example, the 
server 14), and obtains a search result from the target 
server (for example, the server 14). The new search 
result is displayed in the window P1 . In other words, in 
this case, P1 is updated by the set of documents which 

25 was newly searched. 

[0033] To compare a new search result with a previ- 
ous search result, the user may press a back button B2 
to re-display the previous search result in the window 
P1 , and return the display of the window P1 to its state 

30 before search was performed. Likewise, the window P1 
can be advanced to the new search result by pushing a 
forward button B3. 

[0034] As the user can search other document 
databases corresponding to such a search result at any 

35 stage of the search, the user can freely proceed from 
one database to another database by repeating the 
search cycle. Naturally, it is also possible to repeat this 
cycle within the same document database, i.e., without 
changing the document database. 

40 [0035] According to this invention, the user can 
freely specify a document database to be searched and 
freely enhance the search without concern for the loca- 
tion or composition of each document database. Fur- 
ther, as a server in which a document database is 

45 located can be modularized, the server can be made to 
function as an source database or a target database 
with respect to all other databases connected to a net- 
work simply by connecting a server comprising a sum- 
mary making module and a search module to the 

50 network when it is desired to search new document 
databases. 

[0036] While the present invention has been 
described above in conjunction with the preferred 
embodiments, any person skilled in the art would be 
55 enabled by this disclosure to make various modifica- 
tions to this embodiment and still be within the scope 
and spirit of the invention as defined in the appended 
claims. 
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Claims 

1. A document search method having a function to 
change over between plural document databases 
(131 ,141), and a function to search a set of docu- 
ments having a high relevance to a search input 
from a selected document database in the order of 
higher relevance, this input being a set of keywords, 
fragments of a document or any desired set of doc- 
uments, wherein the search results from said docu- 
ment database (131) can be used as input for 
searching another database (141). 

2. A document search method as defined in claim 1, 
wherein an interface is provided in which a set of 
documents from the search result of one document 
database (131) can be selected or deselected, and 
a set of documents selected from the search result 
can be used as input to perform a search on 
another database (141). 

3. A document search method as defined in claim 1, 
wherein a summary containing only topic words in 
the search input is used to perform a search. 

4. A document search method as defined in claim 1, 
wherein servers (13, 14) comprising document 
databases (131, 141) and programs to manipulate 
said databases are dispersed over a network (12), 
a client (11) transmits a set of documents in a 
search input to a server (13) where a selected doc- 
ument database (131) is stored, receives a sum- 
mary comprising only topic words related to the set 
of documents which is sent, sends a search input 
corresponding to said summary reflecting a user's 
evaluation of the summary to a server (14) where 
another document database (141) is stored, and 
receives a search result. 

5. A document search method as defined in claim 4, 
wherein said server produces a summary from 
topic words relevant to a set of documents sent by 
the client (1 1 ) and transmits it to the client (11), and 
searches and transmits a set of documents having 
a high relevance to any summary sent by the client 
(11), to the client (11). 

6. A document search method as defined in claim 4, 
wherein said client (11) has an interface (111) for 
specifying a set of documents for search input and 
document databases to be searched, the set of 
documents in the search input is sent to a server 
(13) specified by the user, a summary of the set of 
documents is received from this server (13), the 
summary received is sent to a server (1 4) compris- 
ing another document database (141), and search 
results are received from the latter server (14) and 
displayed. 



7. A service for searching documents wherein servers 
(13, 14) comprising document databases (131, 
141) and programs to manipulate said databases 
(131, 141) are dispersed over a network (12) and a 

5 client (1 1) connected to said servers (13, 14) per- 
forms a document search, said service providing a 
function for the client (1 1) to transmit a set of docu- 
ments in a search input to one of said servers (13, 
14) where a selected document database (131, 
10 141) is stored, receive a summary comprising only 
topic words related to the set of documents which is 
sent, send a search input corresponding to said 
summary reflecting a user's evaluation of the sum- 
mary to a server where another document data- 
is base is stored, and receive a search result, wherein 
said server produces a summary of topic words rel- 
evant to the set of documents sent by the client and 
transmits it to the client (11), and searches and 
transmits a set of documents having a high rele- 
20 vance to any desired summary sent by the client 
(11), to the client (11). 

8. A document search method as defined in claim 2, 
wherein a summary containing only topic words in 

25 the search input is used to perform a search. 

9. A document search method as defined in claim 2, 
wherein servers (13, 14) comprising document 
databases (131, 141) and programs to manipulate 

30 said databases are dispersed over a network (1 2), 
a client (11) transmits a set of documents in a 
search input to a server (13) where a selected doc- 
ument database (131) is stored, receives a sum- 
mary comprising only topic words related to the set 

35 of documents which is sent, sends a search input 
corresponding to said summary reflecting a user's 
evaluation of the summary to a server (14) where 
another document database (141) is stored, and 
receives a search result. 

40 
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